How Xenopus laevis replicates DNA reliably even though its origins of replication are 

located and initiated stochastically 



John BechhoefeiQ and Brandon Marshall 
Department of Physics, Simon Eraser University, Bumaby, B.C., V5A 1S6, Canada 

(Dated: February 9, 2008) 

DNA replication in Xenopus laevis is extremely reliable, failing to complete before cell division no 
more than once in 10,000 times; yet replication origins sites are located and initiated stochastically. 
Using a model based on Id theories of nucleation and growth and using concepts from extreme- value 
statistics, we derive the distribution of replication times given a particular initiation function. We 
show that the experimentally observed initiation strategy for Xenopus laevis meets the reliability 
constraint and is close to the one that requires the fewest resources of a cell. 

PACS numbers: 87.15.Aa, 87.14.Gg, 87.17.Ee, 87.15.Ya 



DNA replication is one of the defining processes of 
living systems, and evolution has accordingly selected 
for highly reliable replication mechanisms. The South 
African clawed frog Xenopus laevis is an organism often 
used to study replication in eukaryotes [ij. The replica- 
tion of its embryonic cells is particularly interesting, as it 
corresponds to a "stochastic limit," where the placement 
and initiation of the sites where DNA replication begins 
("replication origins") show significant stochasticity 0. 
As with humans, the Xenopus genome contains approx- 
imately three billion bases [31. Just after fertilization, 
cells divide for twelve generations with an abbreviated 
cell cycle that is as short as 25 min. (at 20 °C). The 
cell cycle is divided into an "S phase" of about 20 min., 
when DNA is replicated, and a mitosis phase of about 
5 min., when chromosomes separate and the cell divides 
0. In order to replicate so many bases in so little time, 
the cell initiates DNA replication at many ©(lO'^)] 
origins. For these embryonic cells, in contrast to the sit- 
uation for fully developed somatic cells, there is no se- 
uence dependence to the location of replication origins 
In addition, each origin initiates stochastically, with 
no pre-determined time of initiation. The stochasticity 
in the location and initiation of replication origins leads 
to a potential difficulty: the typical time for replication 
is about 20 min., but the maximum allowable time is 
only 25 min. In particular, embryonic cells lack the effi- 
cient checkpoint mechanisms Q that somatic cells have 
to pause the cell cycle to allow for unusually slow repli- 
cation. The cell must replicate by the time it divides, 
or die. But empirically, such a "mitotic catastrophe" 
is rare, < 10** replications Q. How can one reconcile 
the variations in S-phase duration due to the stochastic 
placement and initiation of origins with the high reliabil- 
ity of replication? 

In the biological literature the above is known as the 
"random-completion problem" Q and has been an un- 
settled question for over twenty years 0, 0, Hi- In its 
simplest form, randomly placed origins imply an expo- 
nential distribution of origin separations and, hence, a 
small number of very large gaps that take a long time 



to replicate. Two approaches to a solution have been 
advanced. The first notes evidence that the spacing of 
origins is not completely random and that any regularity 
in the spacing of origins will tend to suppress large gaps 
However, in isolation, such a scenario is fragile: if 
a single origin fails to initiate, it will create a much larger 
gap than exists usually. The second approach draws on a 
recent experimental result that origins initiate through- 
out S phase and, indeed, that the rate of initiation of 
origins, I{t) (initiations per time per length of unrepli- 
catedgenome) , increases significantly as S phase proceeds 
0,1 

[ill . [T3 | . Intuitively, initiating origins throughout S 
phase allows the cell to "fill in gaps" and avoid unusually 
long delays. 

In this Letter, we first calculate, following theories of 
nucleation and growth in one dimension 13|, |lj| , the dis- 
tribution of replication times Prep{t) given an initiation 
function I{t) and a constant "fork velocity" v describing 
the symmetric growth of replication domains. We find 
that an increasing I{t) can insure replication at the re- 
quired level of reliability, even in the worst case of com- 
pletely random origin spacing. We then show that the 
specific I{t) observed in in vitro experiments is close to 
an optimal I(t) that minimizes the amount of cellular 
replication machinery (polymerases, helicases, etc.) that 
a cell is required to supply. 
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FIG. 1: Schematic of DNA replication model. Space-time dia- 
gram showing multiple origins (filled circles), each expanding 
symmetrically at constant velocity. Domains coalesce when 
they meet (open circles). 



Our derivation of prep uses a model inspired by the 
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Kolmogorov- Johnson-Mehl-Avrami theory of crystalHza- 
tion kinetics which is a stochastic model with three 
elements: nucleation (initiation) of ordered (replicated) 
domains; symmetric growth of these domains; and coales- 
cence of domains that grow into each other. (See Fig.[TJ) 
Using such a model, we showed that the fraction / of 
DNA replicated on an infinite domain at a time t after 
the start of S phase is given by 



fit) = 1 



-2vh{t) 



(1) 



where h{t) = /J g{t')dt' and g{t) = /J I{t')dt' and /(t) 
is the initiation function (> 0) 16]. Here, v is the fork 
velocity, and f{t) typically has a sigmoidal shape. Equa- 
tion [1] predicts that it will take infinite time to replicate 
all the DNA (/ = 1); but obviously, the replication time 
should be finite on a finite-length genome. Because the 
location and time of initiation of origins is stochastic, the 
time to finish replication will also be a stochastic process. 

In order to calculate the distribution of replication 
times Prep{t), we first note that, except for edge effects, 
there is a one-to-one mapping from replication origins 
to coalescences of rephcation domains. (See Fig. [T]) 
Because the evolution of domains is deterministic once 
the origin has initiated, one can derive the distribution 
of coalescence times, pdt) from the initiation function 
I{t). In [3], we derived the density of non-replicated 
domains ("holes") of size x at time t to be nh{x,t) = 
g^{t) exp[—g{t)x — 2vh{t)]. Since a coalescence event is 
equivalent to a hole of zero size (x — 0), we can write the 
normalized distribution pc(t) as 



Pcit) = ^g'it)e-'^"^^^ 



(2) 



where No is the total number of origins along a genome 
of length L initiated throughout S phase. 

As Fig. [T] shows, the time to complete replication cor- 
responds to the last coalescence event. Since there are 
No coalescences, the problem of determining the typ- 
ical time of the last coalescence is equivalent to ask- 
ing, "Drawing No coalescences from a distribution Pc{t), 
what is the largest time one expects to occur?" Such 
questions are the subject of the field of extreme-value 
[l8l |. where an analog to the central- limit 
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statistics 

theorem holds: given a parent distribution whose maxi- 
mum value is unbounded and whose tail decays asymp- 
totically at least as fast as an exponential (conditions 
satisfied here), the maximum value drawn in No tri- 
als will, for No large, tend to a Gumbel distribution, 
Pg{t) = (1//3) exp[— T — exp(— r)], where the scaled time 
T = {t — t*)/B, with t* the mode of the distribution and 
/3 its width [l3|. An elementary calculation [l7| shows 
that for Eq. [21 the width j3 is given by 2vg{t*) and the 
mode t* by 



where Fc{t) = pc{t')dt' is the cumulative probability 
distribution function (CDF) of the probability distribu- 
tion function (PDF) pc{t). From Eq. El the CDF is, 
asymptotically for large t, given by 
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Equation m is derived by integrating pc{t) by parts and 
dropping sub-dominant terms and, with Eq. [31 leads to 
a transcendental equation for the magnitude of I{t). 

In Fig. [21 we show the results of Monte-Carlo simula- 
tions of the replication-time distribution for various I{t) 
functions. In all cases, we adjusted the amplitude of I{t) 
so that the mode of prep (t) is at t* — 38 min. , which cor- 
responds to the mode deduced from the I{t) measured in 
the in vitro experiments. (For the in vivo experiments, 
t* ^ 20 min. [j|.) The solid lines are fits to a Gumbel 
distribution. The parameters deduced (the (3s) are con- 
sistent with the values predicted in the paragraph above. 



O Delta function 
• n = 

/• n = 1 l(t) - t" 



0.1 



T3 



s. 

a. 



0.01 



0.001 



30 




40 50 
Replication time (min) 



60 



Foit*) = 1 - 1/No 



(3) 



FIG. 2: Replication-time distribution function, fixing the 
mode to be t* — 38 min. Markers are results from Monte 
Carlo simulations (3000 trials per simulation); solid lines are 
fits to the Gumbel distribution. 

The striking implication of Fig. [2] is that one can vary 
the width of the replication-time distribution pc by choos- 
ing an initiation function I(t) that increases throughout 
S phase. Initiating all the origins at the beginning of 
S phase [I{t) — IsS{t)] leads to the broadest possible 
distribution. Exploring power-law initiation functions 
I{t) = Inf^ (with /„ fixed by the t* constraint), we see 
that as one progresses from constant {n = 0) to linear 
{n = 1) to quadratic (n = 2) initiation functions, the 
width of Pc is progressively reduced. The replication-time 
distribution can also be calculated using the experimen- 
tal I{t) [l2| (not shown). The experimental I(t) is close 
to a quadratic curve and its distribution is indistinguish- 
able from the n = 2 case. 

It would thus appear that the cell can have arbitrar- 
ily reliable replication (an arbitrarily narrow distribution 
Prep) simply by arranging for its initiation curve to in- 
crease fast enough. In fact, the situation is more subtle. 



3 



Even when all origins are initiated at the beginning of S 
phase, it is possible to replicate with arbitrary reliability 
simply by having enough origins. While it is true that 
there will be a few unusually long gaps that will set the 
replication time, these gaps may be reduced arbitrarily if 
one starts with enough replication origins. We thus pro- 
pose an alternate way of viewing the random-completion 
problem: Instead of fixing the number of origins and 
looking at the replication times for different strategies, 
we fix a time t** at which either a cell has finished repli- 
cation or it dies. Since evolution selects on the basis of 
mortality, the replication parameters {I{t), v, the number 
of potential origins, etc.) should be a consequence of this 
selection, and not vice versa. Choosing t** to be the cell- 
cycle time (25) min. and allowing a failure rate of 10~^, 
we calculate, for various forms of I{t), the replication pa- 
rameters required to meet the reliability constraint. (Our 
results depend only logarithmically on the failure rate.) 

In order to compare with experiment, we must confront 
a further problem. While the in vivo replication time is 
estimated to be 20 min., the in vitro experiments require 
nearly twice this time to replicate. We must thus make 
additional assumptions to translate the in vitro experi- 
mental results to the in vivo situation. In fact, we can do 
this with one simple assumption. In earlier studies, it was 
assumed that the replication fork velocity v is constant 
throughout S phase. The original analysis of the in vitro 
Xenopus data thus estimated an average fork velocity of 
0.6 kb/min. More recent work ^] has shown that the 
fork velocity starts at 1.1 kb/min. at the beginning of S 
phase and then decreases monotonically to 0.3 kb/min. 
at the end of S phase. We speculate that the longer time 
for the in vitro S phase is caused by this reduction in fork 
velocity - perhaps because some protein concentrations 
are not kept constant. With this single modification - 
V =^ 1.1 rather than 0.6 kb/min. - we shall find results 
consistent with the in vivo observations. 

In Fig. [21 we show results of simulations that constrain 
the replications to finish by t** = 25 min., allowing a fail- 
ure rate of 10^"*. We see that it is indeed possible to find 
amplitudes for I{t) that satisfy the reliability constraint. 

While it is always possible to choose an amplitude (e.g.. 
Is or /„) to satisfy the reliability constraint, each choice 
will have definite implications for the amount of cell re- 
sources that are required for its implementation. One 
may then ask whether there is a "best" strategy for initi- 
ating origins (while satisfying the reliability constraint). 
If so, how close is the experimental I{t) to the optimum? 

To answer such questions, one must first define a mea- 
sure for cell resources. We have considered two possibil- 
ities among many that can be imagined: the number of 
origins initiated throughout S phase and the maximum 
number of replication forks required. The first choice 
would be relevant if the origin-initiation proteins were 
limited. The second would be relevant if the number of 
polymerases (or other parts of the replication machinery) 
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FIG. 3: Replication-times distribution function, fixing the 
mortality rate at t** — 25 min. to be 10"". (The area to 
the right of the dashed line of each probability distribution 
function is 10~*.) Markers are results from Monte Carlo sim- 
ulations (20000 trials per simulation); solid lines are fits to 
the Gumbel distribution. 



that needed to be active at one time limited the rate of 



We find qualitatively the same results in 



replication 
both cases 21| . 

Intuitively, there should be an optimum for the con- 
sumption of resources. Within the fork-density scenario, 
initiating all origins at the beginning leads to a high ini- 
tial fork density. Holding off initiating until later in S 
phase helps by allowing the machinery of replication forks 
to be repeatedly reused. If the cell waits too long to be- 
gin replication, then it is essentially shortening S phase, 
which requires many origins (and forks). Thus, one ex- 
pects an optimum. We have explored this by calculating 
the maximum number of forks, Umax, required in several 
cases. First, we calculated it for delta-function initiation 
{rirnax = Ideita)- Ncxt, wc numerically calculate Umax 
for the power-law case. Finally, we use the calculus of 
variations to calculate the optimal I{t), denoted Iopt{t) 
that minimizes the maximum number of required forks, 
subject to the reliability constraint. To calculate /opt, 
we note that the number of replication forks is given by 
"•(^) = i h = 2.9(t) cxp —2vh(t) [31. One can extract the 
maximum fork density using a technique familiar from 
control theory (Tioo metric) [22] . We thus write 



,[I{t)] = lim 
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The associated Euler-Lagrange equation turns out to be 
independent of the exponent p. We find 



h{t) = 2vh^{t) , 



(6) 



where we recall that h{t) — I{t) and h{t) = g{t). Solving 
Eq. [6] subject to the boundary condition h{Q) = gives 
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Equation [7| implies that the fork density n = 1 /vt* is 
constant throughout S phase. 
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FIG. 4: Maximum required fork density, for different replica- 
tion schemes. 

In Fig. m we summarize the results of these investiga- 
tions. The dashed line at the top gives the fork density 
required to make the delta- function I{t) meet the reli- 
ability constraint. The solid curve represents the fork 
density required for power-law initiations. As we antici- 
pated, the curve has a minimum (between n = \ and 2). 
The fine-dashed line, which lies close to the minimum 
value of the power-law case, is the experimental maxi- 
mum fork density [TJ. Finally, the broad-dashed line 
gives the optimal fork density {1/vt*). 

Although the optimal fork density is lower than that 
observed, it clearly does not represent a physiologically 
possible case. It is unrealistic to expect the perfect coor- 
dination implied by the delta function at the beginning 
of S phase. More serious, at the end of S phase, Eq. [7] 
implies that the rate of initiation diverges, along with 
the total number of activated origins. Still, we note that 
the qualitative shape of the curve shares the quadrati- 
cally increasing form of the experimental result. More 
generally, it would be surprising if the initiation program 
were identical to the optimum (even if one were to limit 
the space of functions to those that are physiologically 
achievable). We note that the minimum is clearly broad: 
there is little difference in required fork density between 
a linear and a quadratic I{t). The main point is that 
there are some strategies - most notably the initiation of 
all origins at the beginning of S phase - that are clearly 
bad, and these differ from the observed I{t). 

In conclusion, we have calculated the distribution of 
replication times prep for the stochastic limit of repli- 
cation, where origins are placed randomly and initiate 
stochastically at a rate I{t). Choosing an I{t) that in- 
creases with time narrows prep and increases the relia- 
bility of replication. Using the known mortality rates 
and length of the cell cycle, we gave a quantitative inter- 
pretation to the random-completion problem and showed 
that one can meet the reliability constraint using an ar- 
bitrary I{t). Different I{t) functions demand different 
resources from the cell. Measuring this resource use by 
the maximum required fork density, we show that the ex- 



perimentally observed form of I{t) is close to optimum. 
In the future, it would be interesting to consider the ef- 
fects of any regularity in origin spacing. While we have 
shown that reliable replication may be achieved even in 
the worst case of random spacing of origins, there is evi- 
dence for some regularity. It would also be interesting to 
measure the replication-time distribution directly. While 
determining the time at which the last base (of three 
billion) replicates is unrealistic, one might be able to de- 
termine when a given fraction (e.g., 90 or 95%) of origins 
have replicated. It is straightforward to generalize the 
methods presented here to determine the distribution of 
times required to reach a given replication fraction. 
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